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Abstract We introduce a general Monte Carlo method 
based on Nested Sampling (NS), for sampling complex 
probability distributions and estimating the normalis- 
ing constant. The method uses one or more particles, 
which explore a mixture of nested probability distri- 
butions, each successive distribution occupying ~ e _1 
times the enclosed prior mass of the previous distribu- 
tion. While NS technically requires independent gener- 
\ ation of particles, Markov Chain Monte Carlo (MCMC) 
exploration fits naturally into this technique. We illus- 
trate the new method on a test problem and find that it 
can achieve four times the accuracy of classic MCMC- 
based Nested Sampling, for the same computational ef- 
fort; equivalent to a factor of 16 speedup. An additional 
benefit is that more samples and a more accurate evi- 
dence value can be obtained simply by continuing the 
run for longer, as in standard MCMC. 
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1 Introduction 

In probabilistic inference and statistical mechanics, it 
is often necessary to characterise complex multidimen- 
sional probability distributions. Typically, we have the 
ability to evaluate a function proportional to the prob- 
ability or probability density at any given point. Given 
this ability, we would like to produce sample points 
from the target distribution (to characterise our uncer- 
tainties), and also evaluate the normalising constant. 
In Bayesian Inference, a prior distribution n(0) is mod- 
ulated by the likelihood function L(8) to produce the 
posterior distribution f(0): 



f(0) = ±n(0)L(O) 



(1) 



where Z = J n(6)L(9) dd is the evidence value, which 
allows the entire model to be tested against any pro- 
posed alternative. In statistical mechanics, the goal is 
to produce samples from a canonical distribution 



/(0) = fj^exp(-A£(0)) 



(2) 



{g{9) = density of states, E(0) — energy) at a range 
of inverse-temperatures A; and also to obtain the nor- 
malisation Z{\) as a function of A. The challenge is to 
develop sampling algorithms that are of general appli- 
cability, and are capable of handling the following com- 
plications: multimodality, phase transitions, and strong 
correlations between variables (particularly when the 
Markov Chain Monte Carlo (MCMC) proposals are not 
aware of these correlations). These challenges have led 
to the development of a lar ge number of sophisticated 
and efficient techniques (e.g. Chib fc Ramamurthv , 2010l : 
Trias. Vecchio fc Veitchl . E009). However, our goal in 
the present paper is to develop a method that requires 
little problem-specific tuning, obviating the need for 
large numbers of preliminary runs or analytical work - 
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Diffusive Nested Sampling can be applied to any prob- 
lem where Metropolis-Hastings can be used. Finally, 
we note that no sampling method can ever be com- 
pletely immune from failure; for example, it is difficult 
to imagine any method that will find an exceedingly 
small, sharp peak whose domain of attraction is also 
very small. 

1.1 Nested Sampling 

Nested Sampling (NS) is a powerful and wi dely ap- 
plica ble algorithm for Bayesian computation (|SkillinsJ . 
2006). Starting with a population of particles {#, } drawn 
from the prior distribution n(9), the worst particle (low- 
est likelihood L{9)) is recorded and then replaced with 
a new sample from the prior distribution, subject to the 
constraint that its likelihood must be higher than that 
of the point it is replacing. As this process is repeated, 
the population of points moves progressively higher in 
likelihood. 

Each time the worst particle is recorded, it is as- 
signed a value X g [0, 1], which represents the amount 
of prior mass estimated to lie at a higher likelihood 
than that of the discarded point. Assigning A- values 
to points creates a mapping from the parameter space 
to [0,1], where the prior becomes a uniform distribu- 
tion over [0, 1] and the likelihood function is a decreas- 
ing function of A. Then, the evidence can be computed 
by simple numerical integration, and posterior weights 
can be assigned by assigning a width to each point, 
such that the posterior mass associated with the point 
is proportional to width x likelihood. 

The key challenge in implementing Nested Sampling 
for real problems is to be able to generate the new par- 
ticle from the prior, subject to the hard likelihood con- 
straint. If the discarded point has likelihood L*, the 
newly generated point should be sampled from the con- 
strained distribution: 

tt(9) f 1, L(6) > L* 
X* \ 0, otherwise, 
where A* is the normalising constant. Technically, our 
knowledge of this new point should be independent of 
all of the surviving points. A simple way to generat e 
such a point is suggested by Sivia fc Skillingl ( 20061 ): 
copy one of the surviving points and evolve it via MCMC 
with respect to the prior distribution, rejecting propos- 
als that would take the likelihood below the current cut- 
off value L* . This evolves the particle with Equation [3] 
as the target distribution. If the MCMC is done for long 
enough, the new point will be effectively independent 
of the surviving population and will be distributed ac- 
cording to Equation |31 Throughout this paper we refer 
to this strategy as "classic" Nested Sampling. 



PL' (9) 



(3) 



However, in complex problems, this approach can 
easily fail - constrained distributions can often be very 
difficult to efficiently explore via MCMC, particularly 
if the target distribution is multimodal or highly cor- 
related. To overcome these drawbacks, several meth- 
ods have been developed for generating the new parti- 
cle (Mukheriee et al. . 2006 : Feroz. Hobson. fc Bridgesi 
2008) and these methods have been successful in im- 



proving the performance of Nested Sampling, at least 
in low-dimensional problems. Techniques f or diagnosing 



multim odality have also been suggested bv lPartav. Bartok. k, Csanvi 
(2009) . In section[2]we introduce our multi- level method 
which retains the flexibility and generality of MCMC 
exploration, and is also capable of efficiently exploring 
difficult constrained distributions. 

The main advantage of Nested Sampling is that suc- 
cessive constrained distributions (i.e. pl*{9),Pl* +1 {9), 
and so on) are, by construction, all compressed by the 
same factor relative to their predecessors. This is not 
the case with tempered distributions of the formpT(^) oc 
n(9)L(9) 1 ^ T , where a small change in temperature T 
can correspond to a small or a large compression. Tem- 
pering based algorithms (e.g. simulated annealing, par- 
allel tempering) will fail unless the density of temper- 
ature levels is adjusted according to the specific heat, 
which becomes difficult at a first-or der phase transi- 
tion (the uniform-energy sampling of I Wang &: Landau . 
l200ll is also incapable of handling first-order phase 
transitions). Unfortunately, knowing the appropriate val- 
ues for the temperature levels is equivalent to having 
already solved the problem. Nested Sampling does not 
suffer from this issue because it asks the question "what 
should the next distribution be, such that it will be 
compressed by the desired factor" , rather than the tem- 
pering question "the next distribution is pre-defined, 
how compressed is it relative to the current distribu- 
tion?" 



2 Multi-Level Exploration 

Our algorithm begins by generating a point from the 
prior (tt(9) = pl* where Lq — 0), and evolving it via 
MCMC (or independent sampling), storing all of the 
likelihood values of the visited points. After some pre- 
determined number of iterations, we find the 1 — e _1 
0.63212 quantile of the accumulated likelihoods, and 
record it as L\ , creating a new level that occupies about 
e _1 times as much prior mass as Pl*- Likelihood values 
less than L\ are then removed from the accumulated 
likelihood array. 

Next, classic Nested Sampling would attempt to sam- 
ple pl* via MCMC. The difference here is that we at- 
tempt to sample a mixture (weighted sum) of pl* and 
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PL* . Thus, there is some chance of the particle escaping 
to lower constrained distributions, where it is allowed 
to explore more freely. Once we have enough samples 
from pi* (equivalent to all points from the mixture that 



happen to exceed L*), we find the 1 — e 



0.63212 



quantile of these likelihoods and record it as L* 2 . Likeli- 
hood values less than L* 2 are then removed from the ac- 
cumulated likelihood array. The particle then explores 
a mixture of pl* , Pl\ and pl* , and so on. 

Each time we create a new level, its corresponding 
constrained distribution covers about e" 1 as much prior 
mass as the last. Thus, we can estimate the X-value of 
the kth level created as being exp (— k). This estimate 
can be refined later on, as explained in Section [3] 

Once we have obtained the desired number of levels, 
we allow the particle to continue exploring a mixture of 
all levels. This multi-leve l exploration scheme is si milar 
to simulated tempering ([Marinari fe Parisil I1992T) , but 
using the well-tuned L* values from the run, rather than 
using a pre-defined sequence of temperatures that may 
be poorly adapted to the problem at hand. 



2.1 Weighting Schemes 

The use of a mixture of constrained distributions raises 
the question: how should we weight each of the mix- 
ture components? Naive uniform weighting works, but 
takes time proportional to N 2 to create N levels; con- 
trast this with classic Nested Sampling which has O(N) 
performance in this regard. A simple parametric fam- 
ily of weighting schemes are the exponentially-decaying 
weights: 



exp(^) ,forje{l,2,...,J} 



(4) 



where J is the current highest level, and A is a scale 
length, describing how far we are willing to let the par- 
ticle "backtrack" to freer distributions, in order to assist 
with exploration and the creation of a new, higher level. 
With this exponential choice, the time taken to create 
N levels is proportional to N, the proportionality con- 
stant being dependent on A. Smaller values are more 
aggressive, and larger values, while slower, are more 
fail-safe because they allow the particle to explore for 
longer, and more freely. 

Once the desired number of levels has been created, 
the weights {uij} are changed to uniform, to allow fur- 
ther samples to be drawn. Non-uniform weights can also 
be used, for example to spend more time at levels that 
are significant for the posterior. The simulation can be 
run for as long as required, and the evidence and pos- 
terior samples will converge in a manner analogous to 
standard MCMC. Additional seperate runs are not re- 
quired, provided that enough levels have been created. 



2.2 Exploring the mixture 

Suppose we have obtained a set of constrained distribu- 
tions pl* (9) from the algorithm described above (Sec- 
tion Each of the constrained distributions is defined 
as follows: 

tt(0) f 1, L(9) > L* 
X,- 1 0, otherwise. 



pl-AO) 



(5) 



The particle in our simulation, 9, is assigned a label j 
indicating which particular constrained distribution it 
is currently representing, j ' = denotes the prior, and 
j = 1,2,3,...,J denote progressively higher likelihood 
cutoffs, where j = J is the current top level. To allow 9 
to explore the mixture, we update {0, j} so as to explore 
the joint distribution 

i, Lie) > l* 



a' 



0, otherwise. 



(6) 



where p(J) = Wj is the chosen weighting scheme dis- 
cussed in Section 12.11 and Xj is the normalising con- 
stant for p(6\j), which in general depends on j. In fact, 
Xj is simply the amount of prior mass enclosed at a 
likelihood above L* , and is the a bcissa in the st andard 
presentation of Nested Sampling dSkillind . f2006h . hence 
the notation X rather than the usual Z for a normalis- 
ing constant. 

Updating 9 is done by proposing a change that keeps 
the prior tt(9) invariant, and then accepting as long as 
the likelihood exceeds L*. To update j, we propose a 
new value from some proposal distribution (e.g., move 
cither up or down one level, with 50% probability) and 
accept using the usual Metropolis rule, with Equation[5] 
as the target distribution. This cannot be done without 
knowing the {Xj} - however, our procedure for adding 
new levels ensures that the ratio of the X's for neigh- 
bouring levels is approximately e" 1 . So we at least have 
some approximate values for the {Xj}, and these can 
be used when updating j. However, since our creation 
of new levels is based on sampling, and therefore not 
exact, the ratios of X's will be a little different from 
the theoretical expected value e _1 . Hence, we will ac- 
tually be exploring Equation [5] with a marginal p(j) 
that is slightly different than the weights {wj} that we 
really wanted. However, we can further revise our esti- 
mates of the X's at each step to achieve more correct 
exploration; our method for doing this is explained in 
Section [3] This refinement not only allows the parti- 
cle to explore the levels with the desired weighting, but 
also increases the accuracy of the resulting evidence es- 
timate. 

Note that it is possible to explicitly marginalise over 
j and explore a distribution for 9 only, however it is con- 
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Step 1 



Step 2 



Step 3 



Classic NS 





Diffusive NS 



Fig. 1 An illustration of a the distributions that must be sampled as Nested Sampling progresses. In the classic scheme, at Step 3, we 
must obtain a sample from the coloured region which is composed of two separate islands, which is usually very difficult if MCMC is 
the only exploration option. To ameliorate these difficulties, we explore the mixture distribution (bottom right), where travel between 
isolated modes is more likely. 



venient to retain j for various purposes, most notably 
the refinements in Sections [31 and [3~T1 



likelihood happens to exceed the higher levels' thresh- 
olds. 



3 Revising the X's 

As sampling progresses and more levels are added, the 
actual A-values of the levels can become different from 
the theoretical expectation Xj + i = e Xj which would 
be realised if our sampling were perfect. This causes the 
simulation to explore a p(j) that is very different from 
the desired weights {wj}. Fortunately, we can use de- 
tails of the exploration to obtain estimates of the X's 
that are more accurate than those given by the the- 
oretical expectation Xj + i = e Xj. Conditioning on 
a particular level j, the particle's likelihood should ex- 
ceed level j + Vs likelihood cutoff a fraction Xj + i/Xj of 
the time. Thus, we can use the actual fraction of excee- 
dences n(L > L* +1 \j)/n(j) as an estimate of the true 
ratio of normalising constants for consecutive levels. In 
practice, we only keep track of the exceedence fraction 
for consecutive levels, and use the theoretical expected 



value 



to stabilise the estimate when the number of 



visits n(j) is low: 

X j+1 _ n(L > L* +1 \j) + Ce_ 
X, 



C 



(7) 



Here, the number C represents our effective confidence 
in the theoretical expected value relative to the empir- 
ical estimate, and thus resembles a Bayesian estimate. 
The theoretical estimate e _1 dominates the estimate 
until the sample size n{j) exceeds C, whereupon the 
empirical estimate becomes more important. Clearly, 
n(j) should not be accumulated until level j + 1 exists. 
An additional refinement of this approach can be made 
by realising that the particle, even if it is at level j, may 
also be considered as sampling higher distributions if its 



3.1 Enforcing the Target Weighting 

Note that the ratio of normalising constants between 
level j and level j + 1 is estimated using only samples 
at level j. Sometimes, if the upper levels have been cre- 
ated incorrectly (typically they are too closely spaced in 
log AT), the particle spends too much time in the upper 
levels, rarely spending time in the lower levels, which 
would be needed to correct the erroneous A- values. 

To work around this issue, we also keep track of 
the number of visits to each level relative to the ex- 
pected number of visits, and add a term to the accep- 
tance probability for moving j, to favour levels that 
haven't been visited as often as they should have. If a 
move is proposed from level j to level k, the Metropolis- 
Hastings acceptance probability is multiplied by the fol- 
lowing factor: 



^enforce 



= f {n ]+ C)/{{n 3 )+C) \ 



(n k + C)/((n k )+C)J (8) 
where rij and nj~ are the number of visits to levels j and 
k respectively (These are different from n(j) in Sec- 
tion [3J as they can be accumulated immediately, not 
needing a higher level to exist). Thus, the transition is 
favoured if it moves to a level k that has not been visited 
as often as it should have ((n,k) )- This procedure is anal - 
ogous to the approach used bv lWang fc Landaul (|200lh 
to sample a uniform distribution of energies. These ex- 
pected numbers of visits are tracked throughout the en- 
tire simulation, they are essentially the integrated val- 
ues of the weights {wj} over the history of the simu- 
lation. The power j3 controls the strength of the effect, 
and C again acts to regularise the effect when the num- 
ber counts are low. 
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This procedure (and the one in Section \S§ destroys 
the Markov property, but this effect decreases towards 
zero as the simulation proceeds, so the eventual conver- 
gence of the algorithm is not affe cted (this is analogous j t{t\ x 2 
to the kind of tun i ng dis cussed by Roberts. Gelman fc Gilksl . 
1997t iRosenthall . [2010). In practice, any biases intro- 
duced by this procedure (even with a strong choice of 
(3 = 10) have been found to be negligible relative to the 
dominant source of error, which is incorrect estimates 
of the X's of the levels. 



parameter space is 20-dimensional, and the prior is uni- 
form between [—0.5, 0.5] in each dimension: 



20 , 

>a?2o) = n \ 



1, X-f 

o, 



€ [-0.5,0.5] 
otherwise 



(9) 



4 Assigning X- values to samples 

A diffusive nested sampling run explores the joint dis- 
tribution of Equation[6]in an MCMC-style way. Usually 
this involves a lot of steps, and it is wasteful (of disk 
space and I/O overhead) to save them all; therefore it 
is necessary to "thin the chain" . This results in an out- 
put sample of 8 points. To use them for calculating the 
evidence and posterior quantities, these points must be 
assigned A-values. 

Firstly, each 9 in the sample is assigned an interval 
of possible A-values, by finding the two levels whose 
likelihoods sandwich the particle. The conditional dis- 
tribution of the X-values of points given that they lie 
in some interval is uniform, so we can assign X- values 
by generating uniform random variates between the X- 
values of the two sandwiching levels. This probabilis- 
tic assignment of A-values to the samples gives error 
bars on the evidence and posterior quantities, as clas- 
sic NS does. However, it assumes that the A-values 
of the levels are known exactly, which they are not. 
Unfortunately, this error tends to be more significant 
than the uncertainty of not knowing where each parti- 
cle lies within its interval, and the error bars are over- 
optimistic as a result. Unfortunately, obtaining reliable 
error bars from MCMC-based computation remains dif- 
ficult. Possible methods for assigning uncertainties to 
the X-values of the levels would make use of the num- 
ber counts n(L > L* +1 |j), however such a scheme has 
not been implemented at the time of writing. 



The likelihood function is the sum of two Gaussians, one 
centred at the origin and having width v = 0.1 in each 
dimension, and the other centred at (0.031, 0.031, 0.031) 
and having width u — 0.01. The narrower peak is weighted 
so that it contains 100 times as much posterior mass as 
the broad peak. The likelihood function is: 



Lid 



L(xi,x 2 , —,x 2 o) = 1 : 



exp I 



/27T 



100 JJ 



exp {-\{{xi - 0.031)/it) 2 ) 



'2-k 



(10) 



(11) 



For this problem the true value of the evi dence is log(lO l) 
w 4.6151, to a very good approximation. Skillind ( 2006h 
showed that this system has a phase transition which is 
handled easily by classic Nested Sampling. By shifting 
the dominant mode off-centre, now the problem has a 
phase transition and is also bimodal. The bimodality 
allows us to illustrate an interesting effect where im- 
perfect MCMC exploration causes the dominant mode 
to be discovered late fSection l5.ll) . 

The MCMC transitions used throughout this sec- 
tion were all nave random walk transitions, centred on 
the current value, with one of the x's chosen at ran- 
dom to be updated. To work around the fact that the 
optimal step-size changes throughout the run, we draw 
the step size S randomly from a scale-invariant Jeffreys 
prior oc 1/5 at each step, with an upper limit for S 
of 10° = 1 and a lower limit of 10 -6 . The proposal 
distribution used to change j was a Gaussian centred 
at the current value, with standard deviation S' which 
was chosen from a Jeffreys prior between 1 and 100 at 
each step. The proposed j was then, of course, rounded 
to an integer value and rejected immediately if it fell 
outside of the allowed range of existing levels. A more 
sophisticated approac h to setting t he stepsize would be 
to use Slice Sampling (lNeail2003h . 



5 Test Problem 

In this section we demonstrate the diffusive Nested Sam- 
pling method on a test problem that is quite simple, 
yet has enough complications to shed some light on the 
differences between the diffusive and classic NS alg o- 
rithms. The problem is adapted from ISkillinei (|2006h 's 
"Statistics Problem" but modified to be bimodal. The 



5.1 Typical Performance of Diffusive NS 

To illustrate the typical output from a diffusive Nested 
Sampling run, we executed the program on the test 
problem and using numerical parameters defined in Ta- 
ble [TJ The progress of the algorithm is displayed graph- 
ically in Figured Diffusive NS has two distinct stages, 
the initial mostly-upward progress of the particle, and 
then an exploration stage (in this case, exploring all 
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Table 1 Parameter values used for diffusive nested sampling on 
the test problem. 



Parameter 



Value 



Number of particles 1 

Number of likelihoods needed to create new level 10,000 

Interval between particle saves 10,000 

Maximum number of levels 100 

Regularisation constant (C) 1,000 

Degree of backtracking (A) 10 
Strength of effect to enforce exploration weights (/3) 10 



levels uniformly) which can be run indefinitely, and is 
constantly generating new samples and refining the es- 
timates of the level X-values. 

Of particular note is the fact that, during the uni- 
form exploration phase, the particle can easily mix be- 
tween levels 55 and 100, and between and 55, but 
jumps between these regions occur more rarely. This 
occurs around where the narrow peak in the poste- 
rior becomes important; imperfect MCMC exploration 
does not notice its presence right away. See Figure [3] 
for more information about this phenomenon. The fi- 
nal log-likelihood vs prior-mass curve, and the output 
samples, are shown in Figure |U 



5.2 Empirical comparisons with Classic NS 

It is of interest to evaluate the performance of the diffu- 
sive NS method in comparison to classic NS. Is it less, 
equally, or more efficient? In particular, how well do the 
two methods cope with imperfect MCMC exploration? 

In this section we explore this issue, albeit in a 
limited way. We ran diffusive and classic NS on the 
test problem defined in Section [5l with the parameters 
of the algorithms chosen to be as similar. Both algo- 
rithms were limited to 10 7 likelihood evaluations for 
all of the tests. The parameters for the diffusive NS 
runs are shown in Table [TJ and the classic NS parame- 
ters were chosen such that the programs have reached 
logX = -100 by the time the allotcd 10 7 likelihood 
evaluations had occurred. These parameters are listed 
in Table [2J The number of steps parameter listed in Ta- 
ble[2]for classic NS defines how many MCMC steps were 
used in an attempt to equilibrate a particle. The imple- 
mentation of the classic Nested Sampling algorithm was 
the standard one where a particle is copied before being 
evolved. 

Each of these algorithms were run 24 times, and 
the root mean square error of the estimated (posterior 
mean) evidence values were calculated. These results 
are shown in Table [2] and Figure [51 and show a signif- 
icant advantage to diffusive NS, by a factor of about 



four even compared to the most favourable classic NS 
parameters. To obtain similar accuracy with classic NS 
would have required 16 times as much computation. 

This improvement can be attributed to diffusive NS's 
use of all visited likelihoods when creating a new level. 
Classic NS uses only the likelihood of the endpoint of 
the equilibration process, yet presumably all of the like- 
lihoods encountered during the exploration are relevant 
for creating a new level, or for judging the compression 
of an existing level. 

Another feature of diffusive NS that probably con- 
tributes to its efficiency is the fact that the particle can 
backtrack and explore with respect to broader distri- 
butions. This is particularly important when the target 
distribution is correlated, yet the proposal distribution 
is not. Then, the particle can fall down several levels, 
take larger steps across to the other side of the corre- 
lated target distribution, and then climb back up to the 
target distribution. 

For classic NS with a single particle, the RMS error 
is comparable with the theoretical \J H/N ss y/ 63.2 /N 
error bar of classic NS. The RMS error de creases with 
more particles, as expected ( Murravl l2007t) . but fails to 
fall off as 1/VN, eventually increasing again. This is 
because the number of MCMC steps per iteration had 
to be decreased to compensate for the computational 
overhead of the increase in the number of particles, and 
the theoretical error bars assume that the exploration is 
perfect. While it may be possible to improve classic NS 
by attempting to retain the diversity of the initial pop- 
ulation (sometimes evolving the deleted particle, rather 
than copying), this has not been implemented yet, and 
thus the comparison presented in this paper is against 
the common implementation of classic NS. 



5.3 Multi-Particle Diffusive NS 

On highly correlated and multimodal target distribu- 
tions, two strategies suggest themselves. One is to make 
A large, so that the particle can backtrack a long way. 
Then, when the particle reaches the top level again, it 
is likely to be very weakly correlated with the origi- 
nal point. A more efficient alternative would be to run 
diffusive NS with multiple particles, evolving them in 
succession or choosing one at random at each step. If 
a particle gets stuck in a subordinate mode, this can 
be detected by its repeated failure to be within a dis- 
tance of a few times A of the top level J. Such parti- 
cles can simply be deleted, and more CPU time can be 
used for evolving the surviving particles. This approach 
performs better than the copying method of classic NS 
because particles are only deleted when they are known 
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Fig. 2 The anatomy of a diffusive-NS run. Initially, the particle tends to move upwards, creating new likelihood levels. During this 
phase, the particle can backtrack somewhat, allowing freer exploration and also refining the estimates of the compression ratios of the 
newly created levels. Once the desired number of levels (in this case, 100) have been created, the particle attempts to explore all levels 
with the prescribed weights, in this case uniform weights. 



o 




" 1 ' 4 20 40 60 80 100 

Level (j) 

Fig. 3 Estimated compression from each level's distribution to the next. If sampling were perfect, the log(Jf ) difference between 
successive levels would be —1. This is approximately correct except around levels 50-55, where some levels were "accidentally" created 
too close together. This is because the slowly-exploring particle failed to notice the presence of the narrow peak immediately, but in 
the meantime still created some new levels. 



Table 2 The parameters for the test runs of the algorithms, and the resulting RMS error (from 24 runs of each algorithm) of the log 
evidence. Each algorithm was allowed 10 7 likelihood evaluations. Diffusive NS outperformed classic NS significantly on this problem. 



Algorithm 



Parameter Values 



Diffusive NS As per Table [T] 

Classic NS 1 particle, 100,000 MCMC steps per NS step 

Classic NS 10 particles, 10,000 MCMC steps per NS step 

Classic NS 100 particles, 1,000 MCMC steps per NS step 

Classic NS 300 particles, 333 MCMC steps per NS step 



RMS Deviation in log(Z) Theoretical y/H/N 



0.583 
5.82 
2.96 
2.07 
2.71 



7.95 
2.51 
0.80 
0.46 
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Fig. 4 Estimated log-likelihood curve for the test problem, based on a diffusive Nested Sampling run. The blue curve is constructed 
from the estimated X-values of the levels, while the circles represent sample points. The chain has been thinned further in order to 
show the discrete points more clearly. 
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Classic 
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Fig. 5 Performance of five different methods on the test problem. The first method (leftmost box and whisker plot) is diffusive NS, 
the others are classic NS in increasing order of the number of particles. The bars indicate the spread of results obtained from repeated 
runs of each method with different random number seeds. The horizontal line indicates the true value. 



to be failing, whereas in classic NS the copying opera- 
tion will inevitably destroy the diversity of the initial 
population even when the MCMC exploration is satis- 
factory. 

When running multi-particle diffusive NS, it is ad- 
visable to reset the counters (from Sections [3] and 13. lj) 
once the desired number of levels have been created. 



This prevents the deleted particles from creating un- 
necessary barriers to exploration. 

6 Conclusions 

In this paper we introduced a variant of Nested Sam- 
pling that uses MCMC to explore a mixture of progres- 
sively constrained distributions. This method shares the 
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advantages of the classic Nested Sampling method, but 
has several unique features. Firstly, once the desired 
number of constrained distributions have been created, 
the particle is allowed to diffusively explore all levels, 
perpetually obtaining more posterior samples and refin- 
ing the estimate of the evidence. Secondly, this method 
uses the entire exploration history of the particle in or- 
der to estimate the enclosed prior mass associated with 
each level, and hence tends to estimate the enclosed 
prior mass of each level more accurately than the clas- 
sic Nested Sampling method. We ran simple tests of 
the new algorithm on a test problem, and found that 
diffusive Nested sampling gives more accurate evidence 
estimates for the same computational effort. Whether 
this will hold true in general, or is a problem-dependent 
advantage, will be explored in the future. 
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